A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

stylized

transform

editing

flux-pro/kontext

image-to-image

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

flux-kontext-lora

image-to-image

Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.

image-editing

Veo 3.1

new

veo3.1/fast/first-last-frame-to-video

image-to-video

Generate videos from a first/last frame using Google's Veo 3.1 Fast

new

veo3.1/first-last-frame-to-video

image-to-video

Generate videos from a first and last framed using Google's Veo 3.1

new

veo3.1/fast

text-to-video

Faster and more cost effective version of Google's Veo 3.1!

new

veo3.1/fast/image-to-video

image-to-video

Generate videos from your image prompts using Veo 3.1 fast.

new

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

new

veo3.1

text-to-video

Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!

new

veo3.1/reference-to-video

image-to-video

Generate Videos from images using Google's Veo 3.1

Sora 2

new

sora-2/image-to-video/pro

image-to-video

Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

audio

sora-2-pro

new

sora-2/text-to-video/pro

text-to-video

Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

sora-2/image-to-video

image-to-video

Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

audio

sora

new

sora-2/video-to-video/remix

video-to-video

Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure

video to video

audio

sora

Marquee Video Models

kling-video/v2.5-turbo/pro/image-to-video

image-to-video

Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

stylized

transform

kling-video/v2.5-turbo/pro/text-to-video

text-to-video

Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

animation

stylized

decart/lucy-5b/image-to-video

image-to-video

Lucy-5B is a model that can create 5-second I2V videos in under 5 seconds, achieving >1x RTF end-to-end

kling-video/v2.1/pro/image-to-video

image-to-video

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

minimax/hailuo-02/standard/image-to-video

image-to-video

MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions

pixverse/v5/image-to-video

image-to-video

Generate high quality video clips from text and image prompts using PixVerse v5

stylized

transform

wan/v2.2-a14b/image-to-video

image-to-video

fal-ai/wan/v2.2-A14B/image-to-video

ltxv-13b-098-distilled/image-to-video

image-to-video

Generate long videos from prompts and images using LTX Video-0.9.8 13B Distilled and custom LoRA

video

ltx-video

new

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

Best Avatar Models

creatify/lipsync

video-to-video

Realistic lipsync video - optimized for speed, quality, and consistency.

bytedance/omnihuman

image-to-video

OmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

lipsync

ai-avatar/single-text

image-to-video

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model

animation

lip sync

kling-video/v2.1/master/image-to-video

image-to-video

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

pixverse/lipsync

video-to-video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model

animation

lip sync

kling-video/v1/pro/ai-avatar

image-to-video

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

stylized

transform

Everything Kontext

Explore the best Flux Kontext offerings: top-tier base models, curated LoRA adapters, and the official LoRA Trainer endpoint.

flux-kontext-lora/inpaint

flux-kontext-lora/text-to-image

text-to-image

flux-kontext-lora

image-to-image

flux-pro/kontext/max/multi

image-to-image

flux-pro/kontext/multi

image-to-image

flux-pro/kontext/max

image-to-image

flux-pro/kontext/max/text-to-image

text-to-image

Audio Models

chatterbox/text-to-speech

text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

playai/tts/dialog

text-to-audio

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

audio

minimax/speech-02-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

dia-tts/voice-clone

audio-to-audio

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

speech

mirelo-ai/sfx-v1/video-to-audio

video-to-audio

Generate synced sounds for any video, and return the new sound track (like MMAudio)

sfx

mirelo-ai/sfx-v1/video-to-video

video-to-video

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

sfx

new

beatoven/music-generation

text-to-audio

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

beatoven/sound-effect-generation

text-to-audio

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

sfx

audio

effects

Best Lora Trainers

flux-lora-portrait-trainer

training

FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.

LoRA trainer for FLUX.1 Kontext [dev]

flux-lora-fast-training

training

Train styles, people and other subjects at blazing speeds.

Train custom LoRAs for Wan-2.1 T2V 14B

lora

flux-pro-trainer

training

FLUX LoRA for Pro endpoints.

lora

personalization

Best Image Models

imagen4/preview

text-to-image

Google’s highest quality image generation model

flux-pro/kontext

image-to-image

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

flux-krea-lora/stream

text-to-image

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

lora

personalization

recraft/v3/text-to-image

text-to-image

Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.

HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Best Utility Models

x-ailab/nsfw

vision

Predict whether an image is NSFW or SFW.

Use the powerful and accurate topaz image enhancer to enhance your images.

bria/video/background-removal

video-to-video

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

background-removal

bria/background/remove

image-to-image

Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04

Professional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling.

upscaling

high-res

Best of Open Source

Some of our favorite open source media models

flux-kontext-trainer

training

LoRA trainer for FLUX.1 Kontext [dev]

ltx-video-13b-distilled/image-to-video

image-to-video

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

Wan 2.2 text to image LoRA trainer. Fine-tune Wan 2.2 for subjects and styles with unprecedented detail.

lora

personalization

wan/v2.2-a14b/image-to-video/lora

image-to-video

Wan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2

Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.

flux-krea-lora/stream

text-to-image

lora

personalization